Recovery of Distorted Document Images from Bound Volumes
نویسندگان
چکیده
Recovery of document images scanned from thick bound volumes is necessary for the purpose of human reading and text retrieval. The main problem with scanning of bound volumes is that there always occurs perspective distortion. Such distortion causes two sources of degradation for the scanned images – 1) shadow at the book spine area, and 2) warping of the words in the shadow. In this paper, we have developed a restoration system to solve these two problems. First, the boundary between the shadow and the clean area is detected. Then the system applies a modified Niblack’s method to remove the shadow. The system uses a connected component analysis to help improve the noise reduction and adjust the location and orientation of the warped word in the shadow area, i.e. the words within the boundary detected earlier. The implementation results for each step are presented. Our system will be used in the text retrieval projects for National Archives of Singapore and NUS Digital Library.
منابع مشابه
رفع اعوجاج هندسی متون بهکمک اطلاعات هندسی خطوط متن
Document images produced by scanners or digital cameras usually have photometric and geometric distortions. If either of these effects distorts document, recognition of words from such a document image using OCR is subject to errors. In this paper we propose a novel approach to significantly remove geometric distortion from document images. In this method first we extract document lines from do...
متن کاملBackground-Based Delineation of Internal Tumor Volumes on Static Positron Emission Tomography in a Phantom Study
Objective(s): Considering the fact that the standardized uptake value (SUV) of a normal lung tissue is expressed as x±SD, x+3×SD could be considered as the threshold value to outline the internal tumor volume (ITV) of a lung neoplasm. Methods: Three hollow models were filled with 55.0 kBq/mL fluorine18- fluorodeoxyglucose (18F-FDG) to represent tumors. The models were fixed to a barrel filled w...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملDocument Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملScript and Language Identification in Degraded and Distorted Document Images
This paper reports a statistical identification technique that differentiates scripts and languages in degraded and distorted document images. We identify scripts and languages through document vectorization, which transforms each document image into an electronic document vector that characterizes the shape and frequency of the contained character and word images. We first identify scripts bas...
متن کامل